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ABSTRACT 

The precise form of the foregrounds for sky-averaged measurements of the 21-cm line during 
and before the epoch of reionization is unknown. We suggest that the level of complexity in 
the foreground models used to fit global 21-cm data should be driven by the data, under a 
Bayesian model selection methodology. A first test of this approach is carried out by applying 
nested sampling to simplified models of global 21-cm data to compute the Bayesian evidence 
for the models. If the foregrounds are assumed to be polynomials of order n in log-log space, 
we can infer the necessity to use n — 4 rather than n = 3 with < 2 h of integration with 
limited frequency coverage, for reasonable values of the n = 4 coefficient. 

Using a higher-order polynomial does not necessarily prevent a significant detection of 
the 21-cm signal. Even for n = 8, we can obtain very strong evidence distinguishing a rea¬ 
sonable model for the signal from a null model with 128 h of integration. More subtle features 
of the signal may, however, be lost if the foregrounds are this complex. This is demonstrated 
using a simpler model for the signal that only includes absorption. 

The results highlight some pitfalls in trying to quantify the significance of a detection 
from errors on the parameters of the signal alone. 

Key words: methods: statistical - cosmology: theory - diffuse radiation - dark ages, reion¬ 
ization, first stars - radio lines: general. 


1 INTRODUCTION 

The sky-averaged or ‘global’ signal from the 21-cm line of hydro¬ 
gen at redshifts z >6 has been put forward as a probe of reioniza¬ 
tion, the ‘cosmic dawn’ (first stars and galaxies, 2 : > 13) and e ven 
the preceding ‘dark ages’ at 2 > 30 JPritchard & Loebll2012h . It 
may complement interferometric measurements of 21-cm fluctua¬ 
tions, and allow higher redshifts to be studied more quickly. 

A persistent concern, however, is that it may not be possible 
to separate the 21-cm signal from bright foregrounds, which in¬ 
clude diffuse synchrotron and free-free radiati on from our Galax y 
as well as emission from extragalactic sources dShaver et alJl999h . 
The problem is more severe even than for interferometric measure¬ 
ments, since the features of the global 21-cm signal extend over 
many MHz, while fluctuations along individual sightlines in inter¬ 
ferometric map s are expected to decor r elate over a bandwid th of 
< 1 MHz (e.g. [Bharadwai & AlfcOOShiMellema et alj|2006l) . The 
global signal is thus likely to be more degenerate with the largely 
smooth foregrounds. Furthermore, it is difficult to obtain indepen¬ 
dent measurements of the foregrounds at high enough precision to 
be useful: interferometers may provide some insight but are sen- 
siti ve to the spatially fluctu a ting part of the foregrounds (though 
see IVedantharn et ani2014bl : IPreslev, Liu & Parsons! l2015l) , while 
monopole observations which are sensitive enough to detect the 
21-cm signal are also likely to be the deepest and best calibrated 
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foreground measurements at the appropriate frequencies. The fore¬ 
grounds and 21-cm signal must therefore be inferred simultane¬ 
ously from the data. 

The degeneracy between the foregrounds and the signal 
may limit the usefulness of the fully blind component separa- 
tion methods which have be en applied to interferometric data (e.g. 
IChapman et al.ll20T^l2013l) . Instead, we might seek a framework 
which can incorporate stronger assumptions about the spatial struc - 
ture and spectral smoothness of the foregrounds jUiu et alJl20T3h . 
This raises the question of how restrictive our foreground models 
must be, or alternatively how complex a foreground model is re¬ 
quired by the data. 

In this letter, we adopt parametrized forms for the 21-cm sig¬ 
nal and the frequency dependence of the foregrounds, and test 
whether the Bayesian evidence could be useful both for selecting an 
appropriate foreground model, and for inferring the presence of a 
21-cm signal in the data given such a model. We consider synthetic 
data generated using only a highly simplified instrument model, 
but test whether preliminary measurements without the bandwidth 
or integration time of a full global signal experiment might be able 
to constrain the level of complexity present in the foregrounds. We 
then move on to consider constraints on the 21-cm signal itself, 
and the interplay between signal inferences and the order of the 
foregrounds. 

The parameters of the signal, foregrounds and instrument are 
described in Sec.|2 Here, we also briefly introduce the methods we 
use for computing the Bayesian evidence, and how the evidence is 
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used to compare foreground models. The results of our evidence 
computations, for a range of different levels of foreground com¬ 
plexity and integration time, are given in Sec.j^and discussed fur¬ 
ther in Sec.|4] 


2 METHODS 

2.1 Signal and foreground modelling 

We consider an experiment which observes a single patch of sky 
for a length of time fobs- The noise on the measurement is purely 
thermal noise computed according to the radiometer equation, as¬ 
suming an antenna with a flat frequency response and an efficiency 
of 85 per cent. In addition to the sky noise, there is a contribu¬ 
tion from the receiver which we take to be 226.2 K, though this is 
not intended to be representative of any particular experiment. We 
also assume that the only foregrounds present are smooth, diffuse 
Galactic foregrounds, and a sea of extragalactic sources that appear 
as a diffuse foreground at the resolution of proposed 21-cm global 
signal experiments. That is, we neglect sources such as the Sun and 
Moon, and contamination by anthropogenic radio frequency inter¬ 
ference. This is motivated partly by the p ossibility of missio ns such 
as the Dark Ages Radio Explorer (ZMf?£. [Bums et alj2012h . which 
would avoid most foregrounds by taking data only while in a low 
orbit over the far side of the Moon, and partly by a desire to re¬ 
duce the number of parameters in the model, in order to make large 
numbers of evidence computations feasible. 

The diffuse foregrounds are taken to have the form of a poly¬ 
nomial in ln(r)-ln(t/), i.e. 

n 

InTpG = InTo + ai[\n{u/uo)Y , (1) 

i=l 

where z/q = 80 MHz is an arbitrary reference frequency, and 
{To, 01 , 02 , ■■■ ,an} are the parameters of the model. By increas¬ 
ing n, we can study progressively more complex, less smooth fore¬ 
grounds. 

Where we include the 21-cm signal, it is parametrized as a cu¬ 
bic spline passing thr ough a number of maxim a and minima (turn¬ 
ing points) following IPritchard & Loej (l2010h . The frequency and 
brightness temperature of these turning points are the parameters 
of the signal model. We restrict our attention t o frequencies of 35 - 
120 MHz, and so we only fit the parameters of ! Pritchard & Lo^ ’s 
turning points 1-3 (corresponding to the start of Lya pumping, the 
start of effective heating, and signal saturation, respectively), leav¬ 
ing the position of turning point 0 (in the true dark ages) and 4 (the 
end of reionization) fixed. We refer below to turning points 0-4 as 
A-E, respectively, to avoid confusion with subscripts. 

When we simulate the noisy spectrum, we assume that 
{ro,ai,a2,a3} = {2039.611,-2.42096,-0.08062,0.02898}, 
co mputed by fitting a quie t regio n of the global sky model (GSM) 
of Ide Oliveira-Costa et alj ( 120080 . convolved with a beam with a 
full width at half-maximum of 72°, with a third-order polyno¬ 
mial over 35-120 MHz. Higher order coefficients are varied as de¬ 
scribed in Sections r3.1l and [T2l For the signal, w e assume the same 
turnin g point positions as the fiducial model of IPritchard & Lo^ 
( |201(]|) . This leads to the input signal shown in Fig.[3 

2.2 Evidence computation 

We use a slig htl y mo difi ed version of MULTINE ST v3.2 
dFeroz & Hobsoij l2008l : iFeroz. Hobson. & Bridges! 1 20091 : 


Feroz et ^l2013l) . which implements nested sampling dSkillingl 
2004h to compute the Bayesian evidence, Z. This also yields 
weighted samples of the posterior probability distribution of the 
parameters given the data. Uniform priors are used for the turning 
point frequencies and positions. For To, we assume a Gaussian 
prior with mean and standard deviation equal to the ‘true’ To, but 
truncated at zero. For ai (the spectral index at oq), we assume 
a Gaussian prior with a mean of the ‘true’ ai and a standard 
deviation of 0.1, while for all other Oi we use a Gaussian prior 
with mean 0 and standard deviation of 0.1. We adopt this value 
because fourth-order polynomial fits to the GSM yield values of 04 
between —0.024 and 0.037 in individual pixels, but this reduces 
by around an order of magnitude after smoothing with a beam of a 
typical size for global 21-cm experiments. We would expect higher 
order terms to be smaller. 


3 RESULTS 

3.1 Eoreground inference with limited frequency coverage 

We start with a test which considers only foregrounds. Data are 
simulated using the instrument model described in Sec. 12.11 but 
only in the ranges 40-50, 75-85 and 110-120 MHz, though we 
do fit all three segments of the spectrum simultaneously. For this 
test, we consider only short integration times. This emulates an ex¬ 
periment with limited scope and which does not attempt to cover 
the whole range from 40-120 MHz with a single antenna hav¬ 
ing a smooth frequency response, which is technically challeng¬ 
ing. {To, Or, 02 , as} remain fixed, but we simulate data for differ¬ 
ent values of 04 (between zero and 0.01 in steps of 0.001). In each 
case, we attempt to fit the data using a third order and a fourth order 
polynomial model, in order to test whether or not the addition of 04 
to the parameter set is justified by the Bayesian evidence. The re¬ 
sults, expressed in terms of the evidence ratio (or difference in log- 
evidence, A In Z) between the third and fourth order models, are 
shown in Fig.[T] Differences in 2A In Z of 2,6 and 10 correspond 
to borders between the categories of ‘not worth more than a bare 
mention’, ‘positive’, ‘strong’ and ‘very strong’ e vidence for one 
model over the other, according to the guidelines of iKass & RaftervI 
l ll995h . 

If 04 = 0.01 (note that \a 4 \ is larger than this for many pixels 
in our GSM), we achieve strong evidence for a non-zero 04 in only 
7.5 min of integration. With fobs = 1 h, we obtain very strong evi¬ 
dence against the simpler, third-order model for 04 > 0.004. That 
is, reasonable levels of foreground complexity can be constrained 
with a brief observation of limited frequency coverage (assuming it 
is sufficiently well calibrated), much less than is required to detect 
the 21-cm signal. 

Note also that for 04 = 0, the evidence always (correctly) 
favours the third-order foreground model, with longer integrations 
producing stronger evidence for the simpler model. The evidence 
never becomes conclusive, however, even for the very long inte¬ 
grations (not shown) we have run as test cases. It seems unlikely 
that we could confidently use only a third-order model for the fore¬ 
grounds in a full 21-cm experiment. 

3.2 Signal inferences with complete frequency coverage 

We now move on to considering a more ambitious but challenging 
experiment aimed at detecting the 21-cm signal itself. We simulate 
data over the complete range from 35 to 120 MHz and include the 
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Figure 1. We show 2 [ln(.E 4 ) — ln(^ 3 )], where and 2^4 are the evi¬ 
dence for a third and fourth-order polynomial foreground model, respec¬ 
tively, for the simulation setup described in Sec. 0 This is plotted as a 
function of a 4 , the coefficient of the fourth-order term in the foreground 
model. The different curves are for observations with different amounts of 
observing time, as shown. Error bars are shown only on the fobs = 0.125 h 
curve, for clarity, but the eiTors for the other curves are similar. Dotted lines 
at 2Aln.E = 0,2,6 (and the axis limits at 10) show typical thresholds 
used to assess the degr ee to which the data favour one model over another 
iKass & Raftervlll99^ . 


Figure 2. The evidence for the presence of a signal in the data, 
2[ln(2with) ~ l*^{'^without)]’ for different levels of foreground complex¬ 
ity, is shown as a function of the integration time, fobs - In computing .H^ith 
we assume a 21-cm signal, parametrized by its turning points, is present in 
the data, while in computing .^without’ no signal is included in the model. 
In all cases, the ‘correct’ order, n, of the foreground model was assumed. 
In generating the synthetic data, we used an = 0.001, and = 0 for 
S < i < n, and the experimental setup described in Sec. Error bars are 
shown only on the eighth-order model curve, for clarity, but are similar for 
all the other curves. Dotted lines show typical thresholds in 2 A In .Z used 
to assess the degree to which the data favour one model over another. 


cosmological signal in the simulations. In Fig. [21 we test how much 
integration time is required to obtain a detection in the presence of 
different levels of foreground complexity. {To, ai, a 2 , 03 } remain 
fixed, as before, but the order n of the simulated foregrounds is 
varied. We take an — 0.001, and Oi = 0 for 3 < i < n, and the 
fitting is done using the same n as the simulation. We plot 2 A In Z 
between a model including a 21-cm signal and one without, as a 
function of integration time, for 3 ^ n ^ 8. 

With only third-order foregrounds, there is very strong ev¬ 
idence for the presence of a 21-cm signal within 2 h of inte- 

i yation, but such a low -order model is likely to be unrealistic 
Vedantham et al.l[2014ah . All higher orders require at least 8 h. 
The increase does not appear to be monotonic, with n = 4 requir¬ 
ing more time than n = 5 or 6 . This anomaly comes about because 
the foreground with 04 = 0.001 partially mimics the signal, which 
thus requires more time to distinguish. We have confirmed this by 
rerunning the n — 5 and n = 6 simulations using a 4 = 0.001 
rather than 04 = 0 , in which case the increase becomes monotonic, 
as expected. This highlights the fact that even if the parametrized 
model for the foregrounds is correct, it is possible to be unlucky 
with the values these parameters take, increasing the time required 
for a detection. 

For n > 6, the behaviour changes, and the increase in A In .E 
with fobs becomes less steep, perhaps suggesting that degeneracies 
between the foreground and signal models are starting to become 
more important. None the less, with fobs = 128 h, there is sig- 
nificant evidence for a signal even wit h n = 8, the order which 
Bemardi, McQuinn, & Greenhilll ( 1201 Sh found was required to ex¬ 
tract unbiased estimates of the signal parameters in their modelling, 
which included structure introduced by the antenna response. 

Even a very significant detection may yield parameter con¬ 
straints which are not straightforward to interpret, however, as we 
show in Fig. [S] Here, we show the credible regions for the posi¬ 
tion (in frequency and brightness temperature) of the three turning 


points lying in our frequency range, overlaid on a plot of the input 
signal. The constraints are taken from the fobs = 128 h realization 
with n = 3, so the 21-cm signal is detected conclusively. 

The frequency of each turning point is measured reason¬ 
ably well, apart perhaps from the low-frequency turning point at 
46.2 MHz, for which the contours do not close within our fre¬ 
quency range, suggesting that only an upper limit could be mea¬ 
sured. The amplitude errors are large, however. For example, the 
brightness temperature of the absorption minimum, the largest 
and most easily detected feature in the signal, is measured at 
— 176.6 ± 42.3 mK. This might be labelled a ‘4-cr detection’, 
even though the Bayesian evidence suggests that a 21-cm signal 
is present at much greater confidence. One reason is the difficulty 
in constraining the overall zero-point of the signal, as demonstrated 
by the inset panel in Fig. [3] This shows the joint constraints on the 
brightness temperature of this feature and that of the emission max¬ 
imum corresponding to the start of reionization. The difference be¬ 
tween these two temperatures (and thus the overall shape of the sig¬ 
nal) is measured much more precisely than either on its own. Care 
is therefore required in translating measurements of the parame- 
ters of a fitting function to physic al quantities of interest, as in e.g. 
iMirocha. Harker. & Bums! (l2013h : ideally, we would constrain the 
parameters of a physical model directly, rather than passing through 
an intermediate fitting function. 

In Fig. (4] we show the effect of integration time on simul¬ 
taneous constraints of the foreground order and the signal model. 
As part of this, we introduce a simplified signal model in which 
the signal never goes into emission: the high-frequency maximum 
in the signal (turning point D, at ~ 100 MHz in Fig. |3j has its 
amplitude fixed to zero, so that the signal is zero at all higher fre¬ 
quencies. The frequency at which the signal reaches zero is still 
allowed to vary. In this model, the intergalactic medium reionizes 
while it is still cold. Since only an absorption trough is present. 
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Figure 3. Constraints on the positions of the three fitted turning points 
(maxima and minima) in the 21-cm signal model for a third-order poly¬ 
nomial foreground model and 128 h of integration, with the experimental 
setup described in Sec. m The blue curve shows the input 21-cm signal 
spectrum used to generate the synthetic data, while the red, filled contours 
show the 68 and 95 per cent credible regions on the frequency and am¬ 
plitude of each turning point. The inset shows the very strong correlation 
between the inferred amplitude of the two higher frequency turning points 
(C and D), which shows that the signal is quite well constrained up to an 
overall additive normalization. Although the evidence for the presence of 
a signal is overwhelming {2A\nZ = 1312; recall that values > 10 are 
considered ‘very strong’ evidence), the amplitude of turning point C (the 
deepest point of the trough in the middle of the band) is only a few standard 
deviations away from zero, demonstrating the fact that this is a poor mea¬ 
sure of the significance with which a signal is detected. The odd shape of 
the contours at low ST\^ is because they are cut off by the priors: lower val¬ 
ues would be unphysical since they would imply a Universe cooling faster 
than adiabatically. 



it i s somewhat similar in spirit to the Gau ssian signal model used 
by iBemardi. McQuinn & Greenhilll ll2015h . For the solid lines in 
Fig. El the data are simulated with fourth-order foregrounds, while 
for the dashed lines they are simulated with eighth-order fore¬ 
grounds. The subscripts in the legend show the polynomial order 
assumed in the fit. 

The blue, solid line shows the evidence ratio between fits as¬ 
suming fourth-order and third-order foregrounds, when the data are 
simulated with the fiducial 21 -cm signal and with 04 = 0 . 001 , and 
where the parameters of the full signal are fitted for along with the 
foreground parameters. The inclusion of the signal does not pre¬ 
vent a significant detection of foreground complexity, for which 
there is very strong evidence with < 2 h of integration. If no signal 
is present or assumed, the evidence is very strong even for 0.5 h, so 
we do not include these lines in order to avoid having to compress 
the scale of the plot. 

The solid green line is identical to the line of the same style in 
Fig.E] and shows how well the presence of a signal can be inferred 
for fourth-order foregrounds. It is almost overplotted by the ma¬ 
genta line, which shows the evidence ratio between a fit including 
the simple signal (ss) model and the null model (note that the data 
were still simulated assuming the full signal model). This similar¬ 
ity shows that assuming a slightly incorrect signal model may not 
be too harmful to a detection. The dashed green line reproduces the 
yellow line from Fig.j^and shows again the effect of increasing the 



Figure 4. The interplay between inference of the foreground order and the 
signal model. For all the solid lines, data are simulated for a fourth-order 
polynomial model with 04 = 0.001, and the full signal model. Different 
lines show 2 In r where r is the evidence ratio given in the legend. The 
subscripts of Z in the legend show the polynomial order used to fit the 
foregrounds, while the superscripts label three different signal models: a 
null signal model (ns), a simple signal model (ss) in which the amplitude of 
turning point D is fixed to be zero, and the full signal model (sig). Note that 
the Z^!Z^^ line with eiTor bars almost overplots the Z^^(Z^^ line. The 
two dashed lines are analogous to the two solid lines of the same colour: 
in each, the data are simulated for eighth-order foregrounds with the full 
signal, and we show 2 In r for the cases given in the legend. 


order of the foreground model on the integration time required for 
a detection. 

Distinguishing the full signal from a simple signal is much 
more difficult, however. The cyan, solid red and dashed red lines 
show the evidence ratio between a fit using the full signal (includ¬ 
ing the emission maximum) and one using the simple signal (ab¬ 
sorption only), for n = 3, 4 and 8 , respectively. For n = 3, the 
full signal is very strongly favoured over the simple signal within 
4 h. This detection may be spurious however, since for n = 4 
(the ‘correct’ order) it requires more than 256 h to achieve. Care is 
clearly required in choosing an appropriate foreground model. For 
n = 8 , meanwhile, even 1024 h are insufficient to distinguish the 
full signal from the simple signal. The absorption trough is clearly 
the outstanding feature in our fiducial model. To detect more subtle 
features in the signal, such as a broad emission maximum expected 
in many reionization models, long integrations will be necessary, 
but not sufficient. Chromatic effec ts from the instrument, which led 
to the eighth-order fits required bv lBernardi, McQuinn & Greenhilll 
( l2015h . or effects from the ionosphere, will also have to be very 
tightly controlled or eliminated. 


4 DISCUSSION AND CONCLUSIONS 

In this letter, we have started to make the case for applying a 
Bayesian model selection methodology to the measurement of the 
foregrounds and the cosmological signal in global 21 -cm experi¬ 
ments. This will allow the data to inform us about the appropriate 
level of complexity for our foreground model, and provides a more 
rigorous means of quantifying our confidence in any detection of 
the 21-cm signal. We have applied nested sampling to simplified 
realizations of spectra from global 21 -cm measurements, using a 
polynomial foreground model (in log-log space) and a simple para- 
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metric form for the 21-cm signal. The framework is easily ahle to 
incorporate other components and different parametrizations, how¬ 
ever. For example, it can he used to select between 21-cm signal 
models, rather than simply distinguishing them from a null signal. 

If this methodology is applied to observational data, it may 
be necessary to include instrumental parameters, and to deal with 
spectra from multiple sky regions s imultaneousl y , as in the Markov 
Chain Monte Carlo approach of iHarker et al.l (l2012h . This will 
greatly increase the number of parameters required to describe the 
data. This increase in dimensionality is especially concerning given 
the exponential sc aling of computational cost with number of pa¬ 
rameters found by I Allison & DunklevI ( l2014h . and has caused us 
problems in extending our analysis to multiple sky regions using 
MULTINEST. Different algorithms to compute the evidence may be 
required. For ground-based 21-cm experiments, we may also need 
to include terms for the emission and abso rption from the iono¬ 
sphere in our model (e.g. [Rogers et alj|2014ll . 

Constraining the parameters of a physical model directly, 
rather than using a simple fitting function for the 21-cm signal, 
also increases the computati onal requireme nts (despite the develop¬ 
ment of efficient codes, e.g. lMiroc hal20l4 . and raises the question 
of whether there might be a better parametrization than the ‘turn¬ 
ing points’ model used here, in the sense of being less degenerate 
with the foreground model while retaining the maximum amount 
of information about physical quantities. This is a topic of ongoing 
study. 
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